Encrypted query-based access to data

ABSTRACT

A query-based system for sharing encrypted data, comprising at least one hardware processor; and at least one non-transitory memory device having embodied thereon instructions executable by the at least one hardware processor to: receive a file and a plaintext tag and provide secure access to the file using the plaintext tag, and, responsive to receiving a search query matching the plaintext tag, securely retrieve the file, wherein providing secure access to the file comprises: encrypting the file into multiple portions, storing each portion separately, deriving multiple differently encrypted ciphertexts by encrypting the plaintext tag multiple times, separately indexing each portion using a different one of the ciphertexts, wherein securely retrieving the file comprises: deriving multiple differently encrypted search queries by encrypting the search query multiple times, querying using the multiple encrypted search queries, retrieving at least some of the multiple portions, and recovering the file from the retrieved portions.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.15/161,563, filed May 23, 2016, which claims the benefit of priority ofU.S. Provisional Patent Application No. 62/164,566, filed May 21, 2015,and entitled “Authorized Cloud-Based Access to Data”, the contents ofwhich are incorporated herein by reference in their entirety, and ofU.S. Provisional Patent Application No. 62/238,726, filed Oct. 8, 2015,and entitled “Authorized Access to Data over a Network”, the contents ofwhich are incorporated herein by reference in their entirety.

BACKGROUND

This invention relates to data security.

The ability to share data over the cloud or Internet has allowed usersto collaborate and communicate in a manner that could not have beencontemplated until very recently. The convenience of sharing data hasmade state-of-the-art data distribution and sharing servicesindispensable in many scenarios ranging from media, business, education,government, social applications and more. Today, anyone with a handheldmobile phone can instantly share their vacation photos with distantfriends and family. Coworkers separated by thousands of kilometers cansimultaneously edit the same file, and schools can upload the latestedition of a textbook and make a single digital copy available tohundreds of students, saving printing and paper costs.

However, together with the benefits of sharing information is thedownside of a loss to privacy. Data stored publicly on the ‘cloud’ isvulnerable to intrusions, hackers, and espionage, exposing users toprivacy violations, blackmail, and threats that were never imaginable inthe past. Moreover, the interconnectedness facilitated by the Internethas greatly exacerbated the damage caused by privacy breaches byenabling leaked information to spread in mere seconds to millions ofpeople across the world.

This has come to public attention over the last few years as childrenhaving come of age with the Internet discover that their adolescence hasleft digital footprints for college admissions staff and potentialemployers to scrutinize. Recent highly publicized scandals havespotlighted the growing problem of intrusions into data storageplatforms and the resulting exposure of private data.

When data sharing over the cloud was first implemented, a common defenseto counter privacy concerns was that the sheer quantity of availabledata protected users from having their data divulged; the pile of datawas so deep, it would be impossible to mine any individual's personalinformation. However, this argument failed to account for the fact thatdata stored on the cloud is tagged and indexed. Rather than a randomheap, the data is highly mapped and networked, and therefore accessibleusing simple search techniques.

Some social media platforms try to overcome these issues by allowingusers to create closed or private circles of ‘friends’ for sharing data.However, this requires all the interacting friends to join thatparticular social media platform, something they are not always willingto do. Furthermore, users' privacy is at the mercy of any givenplatform's security measures, and as these platforms become bigger anddraw more users, their attraction increases as targets for attacks.

The foregoing examples of the related art and limitations relatedtherewith are intended to be illustrative and not exclusive. Otherlimitations of the related art will become apparent to those of skill inthe art upon a reading of the specification and a study of the figures.

SUMMARY

The following embodiments and aspects thereof are described andillustrated in conjunction with systems, tools and methods which aremeant to be exemplary and illustrative, not limiting in scope.

There is provided, in accordance with an embodiment, a query-basedsystem for sharing encrypted data, comprising: a first client; a secondclient; a storage device; an index; and a network, wherein each of thefirst and second clients are configured to communicate with the storagedevice and the index via the network, wherein the first client isconfigured to receive a file and a plaintext tag and provide the secondclient with secure access to the file via the network using theplaintext tag, and wherein the second client is configured to receive asearch query comprising the plaintext tag, and use the search query tosecurely retrieve the file via the network, wherein the first client isconfigured to provide the second client with secure access to the fileby: encrypting the file into multiple encrypted portions, separatelystoring each encrypted portion at the storage device, deriving multipleciphertexts by encrypting the plaintext tag using multiple differentencryption keys, and separately indexing at the index each storedencrypted portion with a different one of the ciphertexts, and whereinthe second client is configured to securely retrieve the file byderiving multiple encrypted search queries comprising the multipleciphertexts by encrypting the search query using the multiple differentencryption keys, separately submitting the multiple encrypted searchqueries to the index, separately retrieving the multiple encryptedportions from the storage device, and recovering the file from themultiple encrypted portions.

In some embodiments, the first client is configured to derive eachciphertext using a symmetric encryption algorithm and a uniquecombination of the encryption keys, and wherein separately indexing eachstored encrypted portion comprises storing the different one of theciphertexts with an encrypted storage location string of the storedencrypted portion, wherein the encrypted storage location string isderived using the symmetric encryption algorithm with the uniquecombination of the encryption keys, and wherein retrieving the multipleencrypted portions comprises, for each encrypted search query,retrieving the encrypted locator string of the portion, decrypting theencrypted locator string using the unique combination of the encryptionkeys used to derive the encrypted search query, and retrieving theencrypted file portion from the storage device using the decryptedlocator string.

In some embodiments, the first client is configured to encrypt the fileusing an asymmetric encryption scheme, and wherein the second client isconfigured to recover the file using an asymmetric decryption scheme.

In some embodiments, the first client is further configured to providethe second client with the multiple different encryption keys.

In some embodiments, the first client is further configured to derivethe multiple different encryption keys from a first key derivationfunction using a seed.

In some embodiments, the second client is further configured to derivethe multiple different encryption keys from a second key derivationfunction using the seed.

In some embodiments, the first client is further configured to providethe second client with the seed over a channel that is independent ofthe network.

In some embodiments, the seed comprises the plaintext tag.

In some embodiments, the first client is configured to encrypt the fileby applying a (q,n) threshold secret-sharing scheme wherein n is thenumber of stored encrypted portions and wherein q are the number ofportions required to recover the file, and wherein the second client isconfigured to separately submit q separate encrypted search queries, andseparately retrieve q encrypted portions.

In some embodiments, the first client is configured to provide thesecond client with a number m, wherein m is the number of portionsrequired to recover the file, and wherein the second client isconfigured to encrypt the search query into m different encrypted searchqueries, separately submit the m encrypted search queries, andseparately retrieve m encrypted portions.

In some embodiments, the number of multiple encryption keys u is fewerthan the number of stored encrypted portions n.

In some embodiments, any of: n and u is determined according to therelationship n≤2^(u)−1, and wherein any of: deriving the multipledifferent ciphertexts and deriving the multiple encrypted search queriescomprises, for each ciphertext and each encrypted search query,encrypting the plaintext ν times using a different one of the 2^(u)−1non-null combinations of the u encryption keys, wherein ν is thecardinality of the combination.

In some embodiments, any of n and u are selected in accordance with aconstraint imposed on the cardinalities of the combinations.

In some embodiments, the constraint comprises imposing a uniformdistribution on the cardinalities.

In some embodiments, for each ciphertext, the first client is configuredto encrypt the plaintext tag using a different one of multiplecombinations of the encryption keys, and sort the encrypted fileportions according to the combinations, and wherein for each encryptedsearch query, the second client is configured to encrypt the searchquery using the different one of the multiple combinations, and sort theretrieved portions according to the multiple combinations.

In some embodiments, the multiple combinations are determined accordingto the prime factors of the number of the stored encrypted portions, andwherein the number of the multiple encryption keys corresponds to thesum of the prime factors.

In some embodiments, the first client is further configured toseparately index by indexing at multiple different indexes, and thesecond client is further configured to separately submit the multipleencrypted search queries by distributing the submissions over themultiple different indexes.

In some embodiments, the first client is instantiated on a firstcomputing device and wherein the second client is instantiated on asecond computing device.

In some embodiments, the first client is further configured to providethe second client with secure access to multiple files using theplaintext tag, and wherein the second client is further configured tosecurely retrieve the multiple files using the plaintext tag, whereinthe multiple encrypted stored portions of the multiple files do notinclude any common identifying information.

In some embodiments, the file comprises a set of parameters associatedwith a device operative with the Internet of Things (IoT).

In addition to the exemplary aspects and embodiments described above,further aspects and embodiments will become apparent by reference to thefigures and by study of the following detailed description.

BRIEF DESCRIPTION OF THE DRAWINGS

Exemplary embodiments are illustrated in referenced figures. Dimensionsof components and features shown in the figures are generally chosen forconvenience and clarity of presentation and are not necessarily shown toscale. The figures are listed below:

FIGS. 1A-1G, taken together, show a system for sharing data, inaccordance with an embodiment;

FIGS. 2A-2B illustrate a system for providing authorized access to data,in accordance with another embodiment;

FIG. 3 illustrates a system to index the multiple cipher-text/locatorpairs, in accordance with an embodiment;

FIGS. 4A-4C show a flowchart of a method for sharing an encrypted fileusing a search query, in accordance with an embodiment; and

FIG. 5 shows an exemplary system for executing any of the methodsdescribed herein.

DETAILED DESCRIPTION

This invention applies a variation of a searchable symmetric encryption(SSE) scheme to access a file fragmented into multiple portions.Computation capacity has improved considerably such that merelyencrypting a file does not necessarily provide protection from hackers.To address this, an encrypted file may be fragmented into multipleencrypted portions which are each stored separately. The portions may becreated using any known technique, such as by partitioning an encryptedfile into multiple fragments or ‘portions’, or by using asecure-threshold secret sharing algorithm. Since the portions cannot beused individually to recover the file without additional portions,scattering the portions may provide an additional security measure.

This invention proposes a method to retrieve statically stored data asfragmented and scattered portions via query. A two-tiered query-basedmethod to share data is proposed: an outer plaintext tier, and an innerencrypted tier. The outer tier interfaces with the users, and providesan interface in keeping with conventional file sharing methods, allowingusers to ‘tag’ a file with a plaintext term, ‘select’ a contact withwhom to share the file, ‘upload’ the file, and subsequently ‘query’ forthe plaintext tag to retrieve the file. Any of the many known graphicaluser interfaces (GUIs) that provide such services may be implemented forthe outer tiers of the uploading client and receiving clientapplications. The inner tier interfaces with the network, storage deviceand index, such that the data flowing therebetween is both encrypted andfragmented. A conceptual diagram of the outer and inner tiers configuredwith an uploading client 110 and a receiving client 114 is shown in FIG.1A.

The uploading client's outer tier receives from the user a file and aplaintext tag and provides them to the uploading client's inner tier,which encrypts and fragments the received data before uploading them toa platform where they can be accessed by the receiving client. Thereceiving client's outer tier receives a plaintext search query andprovides it to the receiving client's inner tier, which uses the searchquery to retrieve the encrypted and fragmented data. The receivingclient's inner tier then recovers the file from the fragmented data andprovides it to the user via the outer tier. The users may be blind tothe workings of the inner tier, allowing them to enjoy the familiar andintuitive user experience of the outer tier, which may be implementedusing any of the many conventional file-sharing interfaces. Thisinvention primarily describes several implementations for the innertier.

The uploading client's inner tier encrypts and fragments the file,received from the outer tier, into portions. Additionally, the plaintexttag is encrypted multiple times using different keys to generatemultiple different encrypted ciphertexts, or ‘signatures’, one forindexing each of the encrypted file portions. The encrypted fileportions are uploaded over a network onto a storage device, and eachportion is separately indexed according to one of the ciphertexts. Thus,from the perspective of the network, storage device and indexingservice, the uploaded data appear as multiple, independent, andseparately indexed files. Since both the file portions and the indexingterms are encrypted, the data uploaded by the uploading client do notinclude information that can indicate that the portions belong togetheror are associated with the same file, nor that the indexing terms belongtogether and are associated with the same file. This makes it hard foran unauthorized user to guess which portions and/or index terms areneeded to recover the file. When many such files are fragmented andindexed, the problem of determining which portions belong to what filecan become significantly more complex.

To retrieve the data, the receiving client's inner tier encrypts theplaintext search query ‘tag’, received from the outer tier, multipletimes using the same keys as above, to generate the ciphertext indexingterms. These are submitted as multiple, independent search queries. Theencrypted file portions are retrieved as multiple separate files, andrecombined to recover the file, which is then provided to the user viathe outer tier. Here too, from the perspective of the network, storagedevice and index, the queried data appear as multiple, independent, andseparately indexed files.

Thus users interfacing with the outer tier handle plaintext tags andunencrypted files, making the system familiar and intuitive. The innertier, interfacing with the network, storage and indexing services,handles encrypted and fragmented data, making the data hard to bothdiscover and recover for a non-authorized agent.

Encryption in this system serves several functions in addition to merelyencrypting the file: 1) it is used to encode the plaintext to generatemultiple ciphertext indexing terms for the file portions; 2) since theindexing terms are generated via encryption, guessing the correctcombination of indexing terms belonging to any one file is difficult; 3)since the ciphertext indexing terms are generated from a singleplaintext, authorized users can easily derive them using familiar ‘tag’and ‘query’ steps; and 4) the keys themselves may be used to positionthe retrieved file portions in order to recover the file.

This last function works as follows: to upload, the uploading clientmaps the encrypted file portions into the cells of an array, and assignskeys to index the cells of the array (i.e. row keys and column keys),such that a unique combination of keys indexes each cell. The indexingterm used to index a portion is derived using the portion's uniquecombination of keys, by encrypting the plaintext multiple times usingeach key of the combination (i.e. encrypting twice using a row key and acolumn key). The receiving client does the same but in reverse: itcreates an empty array with the same keys indexing the cells, as above.The search queries are derived the same way the indexing ciphertextswere derived by the uploading client: the search query for retrievingeach portion is derived by encrypting the plaintext using the uniquecombination of keys (row, column) for the cells of the array. Theretrieved portion is then inserted into the cell indexed by that uniquecombination of keys. Once the array is filled, the portions can becombined according to their order defined by the array to construct theencrypted file, which can then be decrypted.

Thus, in addition to the encryption, the factorization of n,corresponding to the dimensions of the array, may be used to encode anddecode the data. The dimensions of the array may correspond to the primefactors of the number of file portions n. There are several advantagesto this: a) the set of prime factors for n is unique and thus for agiven n, both the uploading and receiving clients can create the samearray, b) the sum of the prime factors is the minimum sum of the factorsof n. Since the number of keys required corresponds to the dimensions ofthe array, using the prime factors requires a minimal number of keys.

This two-tiered method may provide a convenient framework for theauthorized users to tag and access their data using contextualplain-text terms via the outer tier, with the added security of storingtheir data in an encrypted, fragmented and scattered form, via the innertier.

While stored, the encrypted file portions may appear as generic blocksof encrypted data that do not disclose any information regarding thefile to which they belong, nor to the other portions belonging to thesame file. Additionally, the portions do not disclose the identity ofthe file's users. For example, the portions may have a uniform formatand/or size. Furthermore, the portions cannot be decrypted on their own,and need to be combined first with other portions belonging to the samefile. Thus, a hacker intruding into a database storing numerous suchportions of numerous files may be faced with a considerable task ofdetermining which portions belong to which file before attempting todecrypt any of the data. Additionally, or alternatively, some of theportions may be stored at different storage devices, furthercomplicating the task of locating the portions required to recover anygiven file.

A centralized encrypted index may provide access to the portions, andmay list the ciphertext index terms together with the encrypted storagelocations of the portions as encrypted pairs. The encrypted index maystore numerous such pairs for numerous files. Each encrypted pair mayappear as a generic index entry, similar in format and/or length to theother encrypted pairs stored in the index, without disclosing anyinformation regarding the file to which it belongs, nor to any of theother encrypted pairs that index any of the portions belonging to thesame file. Thus a hacker searching through such an index may be facedwith a considerable task of determining which pairs belong to any givenfile. Furthermore, each encrypted pair is encrypted using differentkey(s), thus even should a hacker succeed in identifying pairs belongingto the same file, decrypting them to access the portions may prove afurther challenge. The index may store a sufficient number of pairs tomake the task of guessing the set of pairs that index a given filedifficult. Furthermore, each file may be divided into a different numberof portions, and thus the number of index entry pairs for the differentfiles may vary, presenting another unknown variable to an intruder—notonly does he need to know which blocks and/or indexing pairs to use inorder to recover a file, he needs to know how many.

By contrast, the authorized user possessing the keys and the plaintexttag can easily derive the ciphertext index terms and query the index togain access to the portions for recovering the file. A symmetricencryption scheme may be used to encrypt the pairs in the index, suchthat the key(s) used to derive the search queries may be used to decryptthe storage location returned in response to the query. The encryptedfile portion may then be accessed using the decrypted locator. Theencryption algorithm for encrypting the file may be independent of theencryption algorithm for encrypting the plaintext, and thus may be anysuitable encryption algorithm: symmetric or asymmetric.

Although portions of encrypted data are routinely exchanged via networkpackets there are several important differences between network packetsand the indexed, encrypted file portions described above: the encryptedportions are indexed and stored statically, and thus are accessible viaquery by any number of authorized users for any number of queries overtime. The encrypted file portions can be updated without requiring themto be reindexed, and a subsequent query will return the updated data.Additional files can be added and similarly fragmented, indexed andstored, such that a query can return multiple files. In contrast,network-packets are generated ‘on the fly’ for each instance of a filetransmission, and not stored statically nor are they indexed—the clientdoes not submit individual queries for network-packets, rather a queriedfile is returned packetized. Furthermore, network-packets include asource and destination address, whereas the encrypted portions reveal noinformation about their origin or destination, neither do they revealinformation that may allow associating different portions. This is animportant distinction for maintaining privacy.

Similarly, storage devices routinely partition blocks of encrypted dataand store them separately. However the storage device maintains a map tolink those blocks together such that they may be recombined. Incontrast, the present invention discloses a method for storing theblocks such that the storage device does not know which blocks belongtogether, nor how many blocks belong to any given file. This provides anadditional security measure should an intruder break into the storagedevice in an attempt to recover the files stored therein.

The term ‘plain-text tag’ refers to a sequence of characters having acontextual connotation that may be associated with a file, or a userthereof, or device, application, or situation relating to any of thefile, the users, and use of the file.

The term ‘ciphertext’ refers to an encoded or encrypted version of aplain-text term, and that, as a result of the encryption, does not havea contextual association with the file or with a user of the file. Aciphertext may be derived from a plain-text term using any knownencryption, hashing, or other encoding technique as are known in theart.

The term ‘file’ as referred to herein is understood to be an electronicdocument comprising a set of data. Thus a file may be of any size, suchas ranging from as small as a single byte of memory, i.e. to indicate asetting for a device, to one or more megabytes, such as for storingmultimedia files.

The term ‘client’ as referred to herein is understood to be a set ofprocessing instructions (computer program) implemented in hardwareand/or software that integrates and communicates with another computerprogram.

Reference is now made to FIGS. 1A-1G which, taken together, show asystem for sharing a data set, in accordance with an embodiment.Referring to FIG. 1A, a data sharing application includes a first client110 configured with a first client device 106 and a second client 114configured with a second client device 108. Client 110 provides client114 with secure access to encrypted data over a network 102 responsiveto client 114 submitting a plain-text search query. Network 102 may beany private or public network, such as a local area network, or theInternet. Devices 106 and 108 may be any type of computing deviceconfigured to communicate via wired and/or wireless means over network102, and may be computers, tablets, mobile phones, and devicesassociated with the Internet of Things (IoT), to name a few. Althoughfor the purpose of the description, clients 110 and 114 are shownimplemented on separate devices, it is to be understood that any ofdevices 106 and 108 may be configured with both of clients 110 and 114combined into one client application. In addition to communicating withnetwork 102, devices 106 and 108 may communicate with each otherindependently of network 102, such as via a phone-based messagingsystem, local area network, Wi-Fi, Bluetooth, or wired means such ascables and the like. Additionally, although the invention is describedwith respect to storing and accessing data over a network, this is notmeant to be limiting, and the method may be used to store and accessdata stored locally.

Clients 110 and 114 may each be configured with an outer tier‘plaintext’ interface, and an inner tier ‘encrypted’ layer. The innertiers of clients 110 and 114 may be provided with compatible encryptionalgorithms that, in response to receiving the same plain text,encryption keys and/or seeds, produce the same encrypted text. At leastone of the encryption algorithms configured with clients 110 and 114 issymmetric, allowing the use of the same key for encrypting anddecrypting data. Clients 110 and 114 may each be configured withcompatible key derivation functions that produce matching encryptionskeys responsive to supplying the same seed or passphrase. Examples ofkey derivation functions include random or pseudo-random numbergenerators. The seed or passphrase may be sent securely from client 110to client 114 and used accordingly to generate any of the encryptionkeys required to retrieve and/or recover encrypted data.

Referring to FIG. 1B, a conceptual diagram of the steps performed byclient 110 for sharing the data with client 114 is shown.

Client 110 receives a file 200 and at least one plain-text term, or‘tag’ of file 200 via the outer tier and provides them to the innertier. For example, the outer tier may present a GUI allowing the user ofdevice 106 to tag file 200 with the plain-text term. Alternatively, thetagging may be automatic via a software application that identifies oneor more tokens in file 200 and submits the tokens as tags to client 110.The plain-text tag may be contextually relevant to any file 200, devices106 and 108, and/or any of the users of devices 106 and 108, allowingfile 200 to be shared using a contextually relevant tag via the outertiers of clients 110 and 114. Optionally, the passphrase for generatingthe encryption keys may be the tag, or variation thereof.

Henceforth, the inner tier of client 110 encrypted, divides, indexes andstores the encrypted file data, as follows. Client 110 encrypts file 200into n portions 201(i) using a file encryption key K_(F) and encryptionalgorithm ENC_(F), where n is any positive integer greater than 1, andi∈{1 . . . n} is used as an index to reference any of the n encryptedportions. The index term i, as used herein, is understood to indexcorresponding elements within different sets of n elements, for examplefor two sets A and B having n elements, A(i) and B(i) are understood tomean the i^(th) element of A and the i^(th) element of B. File 200 maycomprise any data type and/or form, such as text, image, multi-media,formatted (spreadsheet, database), parameters associated with a filesharing platform, the IoT, text message, email, and the like.

Optionally, the encrypted file portions are shares derived using asecure secret sharing scheme. For example, encrypted file 200 is dividedby client 110 in accordance with a secure (q,n)-threshold secret sharingtechnique, where q is the minimal number of encrypted portions that needto be retrieved in order to recover file 200. Examples of suchalgorithms include Shamir's scheme, Rabin's IDA scheme, use of theChinese Remainder theorem, to name a few. Alternatively, file 200 may beencrypted using any suitable technique, symmetric or asymmetric, andpartitioned into n portions such that the size of each portion isapproximately 1/n times the size of the encrypted file 200.

Client 110 obtains multiple different index encryption keys K_(I)(n).The index encryption keys K_(I)(n) and optionally the file encryptionkey K_(F) may be generated from the key derivation function using thepassphrase or random seed. Optionally, file 200 is encrypted using anasymmetric scheme using a public/private key pair, and thus, client 114may obtain the decryption key from a memory of device 108.

Client 110 encrypts the plain-text tag n different times using asymmetric encryption algorithm ENC_(L) and multiple encryption keysK_(I) (n) to derive n differently encrypted ciphertexts, CipherTxT(n),shown as an example in FIG. 1B as CipherTxT(1), CipherTxT(2), andCipherTxT(3) for n=3, each of which is used to index a different one ofencrypted file portions 201(n). Examples of symmetric encryptionalgorithms include exclusive- or (XOR), Advanced Encryption Standard(AES), Twofish, Serpent, Blowfish, CASTS, Grasshopper, RC4, 3DES,Skipjack, to name a few.

Referring to FIG. 1C, each portion 201(i) is associated with one of theciphertexts and its encryption key K_(I)(i). For example, referring toFIG. 1C, portion 201(1) is associated with CipherTxT(1) and keyK_(I)(1); and portions 201(2) and 201(3) are associated withCipherTxT(2) and keys K_(I)(2); and CipherTxT(3) and K_(I)(3),respectively.

Referring to FIGS. 1C-1D, each of the n encrypted file portions 201(i)may be stored at a different storage location corresponding to a locatorstring L(i) at one or more (up to n) storage devices 120(i), such as maybe provided by a cloud storage service, Internet service provider, localdisk, or any other suitable storage medium. Optionally, client 110 maysubmit file portions 201(n) over network 102 for storage at devices120(n). Locator strings L(n) may be any suitably storage locationidentifier that allows subsequently locating and retrieving fileportions 201(n) from storage devices 120(n), and may include anysuitable storage addresses, including but not limited to uniformresource locators (URLs), uniform resource identifier (URIs), InternetProtocol (IP) addresses, database or local storage address, and thelike. Portions 201(n) may be stored within the same storage device 120,or alternatively at multiple different storage devices 120(n).Optionally, at least some of the storage devices are associated withdifferent hosting services, platforms, corporations, and/or enterprises.Optionally, client 110 ensures not to store at any single storage device120(i) and/or with any single enterprise a sufficient number of portions201(n) required to recover file 200. By storing one or more of portions201(n) at different platforms, no given platform has access to theportions required to recover file 200. Furthermore, n and/or q may bevary for different files shared by client 110, preventing anunauthorized entity from determining how many portions 201(n) arerequired to recover file 200. Additionally or alternatively, client 110may store portions 201(n) using an anonymous router, such as TOR, toprevent the mutual association of portions 201(n) as a result of havinga common originating IP address.

Client 110 obtains from storage device 120 the locator strings L(n), andfor each portion 201(i), encrypts each L(i), or a portion thereof, usingthe associated encryption key K_(I)(i). For example, L(1), L(2) and L(3)for portions 201(1), 201(2), 201(3), are each encrypted using algorithmENC_(L) with the associated keys K_(I)(1), K_(I)(2), K_(I)(3), toproduce encrypted strings given by ENC_(L)(K_(I)(1),L(1)),ENC_(L)(K_(I)(2),L(2)) and ENC_(L)(K_(I)(3),L(3)), respectively. Theencrypted locator strings are stored with their associated CipherTxT(i),i=1 . . . 3 allowing the locators to be subsequently accessed using theciphertexts.4 Since ENC_(L) is symmetric, the locators may besubsequently decrypted using the same key used for the encryption.

Reference is now made to FIG. 1E, which shows an exemplaryimplementation of a server-side index resource 116 that may be queriedto retrieve portions 201(n) responsive to queries for CipherTxT(n).Client 110 may store the multiple CipherTxT(n) with their associatedencrypted locator strings ENC_(L)(K_(I)(n), L(n)) at index 116, asmultiple cipher-text/encrypted locator pairs via a server 116 a overnetwork 102. A server side application configured with server 116 a mayprovide read and write services to and from index 116, allowing any ofclients 114 and 110 to access index 116 accordingly. It may beappreciated that maintaining index 116 at server 116 a is not meant tobe limiting, and index 116 may be stored locally.

The pairs may be stored in a manner to prevent identifying pairsassociated with the same file, while allowing easy retrieval responsiveto a query. For example, the pairs may be stored according to analpha-numeric order of the ciphertext, thereby scrambling the positionsof associated ciphertexts throughout the index, while allowing searchingfor the ciphertext using efficient techniques. The pairs may conform toa uniform format such that pairs belonging to one file cannot bedistinguished from pairs belonging to a different file.

The search space for CipherTxT(n) may be very large, including manycharacters and a large alphabet such that the probability of anunauthorized agent correctly guessing all of the CipherTxT(n) or subsetthereof that are required to recover any file 200 is very low. Forexample, each of CipherTxT(i) may be 30 characters long and derived froman alphabet of order 10³ yielding a search space in the order of 10⁹⁰.On the other hand, deriving the set of CipherTxT(n) or subset thereofrequired to recover file 200 is relatively easy for an authorized userpossessing the plain-text term and the keys K_(I)(n). Furthermore, dueto the large search space, the probability of collisions within index116 may be very low. Optionally, prior to storing the encrypted pairs,client 110 may query for each of the generated CipherTxT(i) and, shoulda collision be discovered, generate a new set of keys K_(I)(n) forderiving a new set of ciphertexts.

Referring back to FIG. 1D, subject to authorization of device 108,client 110 may provide client 114 with cipher information for generatingCipherTxT(n), such as over a secure channel that bypasses network 102.For example, the cipher information may include any of: the plain texttag, the index encryption keys K_(I)(n), the file encryption key K_(F),the seed/passphrase for generating K_(I)(n), and/or the number ofportions m that are needed to be retrieved in order to recover the file.For example, if file 200 was encrypted using (q,n)-secret sharing,m=q<n, alternatively, if file 200 was encrypted and then divided up inton portions, m=n. Optionally, the passphrase may be the plaintext tagwhich may be provided to client 114 independently, such as via a privateexchange between the users of devices 106 and 108. Optionally the numberof required portions m is predefined, and thus in some embodiments thereis no requirement for client 114 to receive cipher information fromclient 110 to recover the file.

Referring to FIG. 1F, the inner encrypted tier of client 114 receivesthe plain-text tag as a search query from a user via the outer plaintexttier. Client 114 obtains the required encryption keys, such as by usingthe cipher information received from client 110 to derive the keys fromthe key derivation function, and encrypt the tag multiple times usingthe index encryption keys, to derive the search queries, CipherTxT(i),i=1 . . . m.

Client 114 may communicate with index 116 over network 102 and maysubmit the search queries to index 116 to retrieve their associatedencrypted locators L(m). The encrypted locators L(m) may then each bedecrypted by client 114 using the respective index encryption keys usedto derive the search queries. Referring to the example shown in FIG. 1E,in response to querying for CipherTxT(1), CipherTxT(2), andCipherTxT(3), client 114 retrieves their associated encrypted locatoraddresses, ENC_(L)(K_(I)(1), L(1), ENC_(L)(K_(I)(2), L(2), andENC_(L)(K_(I)(3), L(3)) which may be decrypted using keys K_(I)(1),K_(I)(2), and K_(I)(3), respectively, for retrieving portions 201(1),201(2), and 201(3) to recover file 200. Other suitable implementationsfor storing encrypted locator strings ENC_(L)(K_(I)(n), L(n)) with thecorresponding CipherTxT(n) may be similarly be used.

Client 114 uses the decrypted locators to retrieve portions 201(m) fromtheir storage locations at storage device 120. For example, client 114may submit the decrypted locators to a data retrieval application 130,such a browser application that is configured with device 108 toretrieve data via network 102. The retrieved portions 201(m) may be usedto recover file 200 in accordance with the decryption schemecorresponding to the encryption algorithm ENC_(F), and the required filedecryption key. For example, if file 200 was encrypted using a symmetricalgorithm, K_(F) may be used to decrypt the file, where K_(F) was eitherderived locally at device 108 using the key derivation function, orotherwise received. Alternatively if file 200 was encrypted using anasymmetric algorithm, a private key may be used to recover file 200.

Optionally, client 114 may retrieve portions 201(m) via multipledifferent file retrieval applications 130 configured with device 108 toprevent any one application, or browser from mutually associating theretrieved portions, or webpages. Optionally, Client 114 may ensure notto retrieve from any one file retrieval application 130 a sufficientnumber of portions to recover file 200.

Either of client 114 and/or client 110 may use an anonymous identity viaan anonymous router such as TOR when submitting queries and/or writecommands to index 116 to prevent mutually associating the ciphertexts asa result of detecting that the queries originated from the same IPaddress.

The technique above may be used to share multiple files tagged with thesame plaintext tag, as follows. Reference is now made to FIG. 1G whichillustrates an implementation for sharing multiple files using the sameplaintext tag, in accordance with an embodiment. Each of multiple wfiles 200(w) may be encrypted into multiple n portions as describedabove, to yield w×n portions 201(w)(n). These are illustrated in FIG. 1Gas File A and File B having portions A(1), A(2), A(3), B(1), B(2), andB(3) for w=2, and n=3. The plain text tag may be encrypted as aboveusing K_(I)(1), K_(I)(2), and K_(I)(3) to yield 3 unique ciphertexts.Each CipherTxT(i) derived using K_(I)(i) may index the i^(th) encryptedportion of each of the w files, as indicated by the dashed lines. ThusCipherTxT(1) indexes Portion A(1) and Portion B(1) comprising the firstportions of each of File A and File B. Similarly CipherTxT(2) andCipherTxT(3) index the second and third portions of each of File A andFile B, respectively. Each query for CipherTxT(i) may retrieve wportions, 201(j)(i) for j E {1 . . . w} and querying for all nciphertexts retrieves w×n portions.

Recovering w files from w×n portions requires sorting the portionsaccording to their respective files. One method to overcome this wouldbe to assign a different file identifier (ID) to each file and storeeach portion with its file ID. On retrieving the portions, client 114could extract the file ID, sort the portions according to the file ID,and recover each of file from its sorted portions. However, while theportions are stored at storage device 120 or in transit over network 102an unauthorized agent may identify the common file ID, and associatedthe portions as belonging to the same file.

A solution to this may be to encrypt each file ID n times using each ofkeys K_(I)(i) to yield w×n unique encrypted file identifiers denoted by:ENC_(L)(K_(I)(i), fileID(j)) for i∈ {1 . . . n}, j∈{1 . . . w}. Eachfile ID encrypted with key K_(I)(i) may be stored with the portionindexed by CipherTxT(i).

Table 1 below illustrates a simplified example of this scheme that maybe mapped onto the array arrangement of file portions shown in FIG. 1G.The file IDs for File A and File B, ‘A’ and 13′, are indicated in therows of Table 1, corresponding tow in FIG. 1G, and the encryption keysK_(I)(1), K_(I)(2), K_(I)(3) are shown as columns, corresponding to n inFIG. 1G. Each file ID is encrypted using each of keys K_(I)(1),K_(I)(2), K_(I)(3) resulting in six unique encrypted file IDs, shown inthe cells of Table 1:

TABLE 1 Key File ID K_(I)(1) K_(I)(2) K_(I)(3) A ENC_(L)(K_(I)(1), A)ENC_(L)(K_(I)(2), A) ENC_(L)(K_(I)(3), A) B ENC_(L)(K_(I)(1), B)ENC_(L)(K_(I)(2), B) ENC_(L)(K_(I)(3), B)

Each of these encrypted file IDs may be included with its correspondingportion of FIG. 1G. Thus, ENC_(L)(K_(I)(1), A) is stored with PortionA(1), ENC_(L)(K_(I)(1), B) is stored with Portion B(1), etc. Dummycharacters may be added as necessary to any of the file IDs, locatorstrings, and plain-texts, to allow encrypting using the same key. Onretrieving the portions, client 114 may extract the file IDs and decryptthem using the associated key. The w×n portions may be sortedaccordingly and used to recover the w files.

Any given file 200 may be separately shared with multiple differentusers or groups of users using a different set of encryption keys andciphertexts for each user and/or group. As an example, a study of apatient's progress regarding an experimental drug ‘XYZ’, maysimultaneously belong to two different archives, one for sharing withthe patient undergoing the treatment, and the other with a group ofdoctors following the progress. Two different sets of index encryptionkeys, K_(P)(n) and K_(D)(n) may be generated and used to derive from theplain text tag XYZ two sets of ciphertexts, CipherTxt_(P)(n) andCipherTxt_(D)(n), which are used to separately index the file. Access tokeys K_(P) and K_(C) may be provided to each of the patient and doctors,respectively, allowing each to derive Cipher_(P)(n) and Cipher_(D)(n)from the plain text ‘XYZ’ and retrieve the file. Alternatively, adifferent plain-text tag may be used for each user/group.

By enabling a single file to be separately shared with multipledifferent users using a different set of encryption keys/plaintexts, andenabling multiple files to be shared using the same set of encryptionkeys/plaintexts, the system and method described herein may provide aflexible and secure file sharing platform that allows users to definehow and with whom to share their data. Additionally, once a file isindexed, it can be subsequently modified without affecting the indexing,allowing modifications to be felt across all users to maintain dataintegrity. Similarly, new files can be added, indexed using existingindex terms, allowing a single query to retrieve an updated archive ofmultiple files.

Optionally, the number of indexing keys (u) used to derive the nciphertexts may be less than n. Since u encryption keys may be combinedinto as many as 2^(u)−1 different non-null combinations, each ciphertextmay be derived from a different combination of keys, by encrypting theplaintext multiple times using each key of the combination.

Thus, the number of portions n and/or the number of index encryptionkeys u may be selected according the relationship n≤2^(u)−1. As atrivial example, 2 keys may be grouped into 3 different non-nullcombinations: {K1}, {K2}, {K1,K2} which may be used to generate 3different ciphertexts from the same plaintext by encrypting with eachof: K1, K2, and both of K1 and K2.

Optionally, only some of the 2^(u)−1 combinations of the u encryptionkeys may be used. For example, the plaintext may be encrypted by thosecombinations of keys having the same cardinality, such as only thepairs, or triplets of the u encryption keys. Optionally, the cardinalitycorresponds to the number of prime factors of n, which will be describedin greater detail below with respect to Table 2.

Alternatively, the combinations of the u keys used to derive theciphertexts may be selected in accordance with a maximum or minimumcardinality constraint such that each ciphertext is derived using aminimum or maximum number of encryption steps.

Since the cardinality of the 2^(u) combinations is distributedbinomially. Using the trivial example above with 3 keys, there is a ⅔probability that the ciphertext is derived by encrypting the plaintextonce, and a ⅓ probability that the ciphertext is derived by encryptingthe plaintext twice. Thus, in one embodiment, the combinations of the uencryption keys may be selected to have a variable and uniformlydistributed cardinality, providing an additional uncertainty for ahacker to contend with. For example, u=5 yields 31 non-null keycombination: 5 combinations of 1 key, 10 combinations of 2 keys, 10combinations of 3 keys, 5 combinations of 4 keys, and 1 combination of 5keys. The minimum cardinality may be set as 2, and a uniform cardinalitymay be imposed over the sets. Thus, 5 combinations of each of 2 keys, 3keys, and 4 keys may be used to generate 5×3=15 ciphertexts for indexing15 file portions, using 5 encryption keys. The probabilities ofencrypting the plaintext 2, 3, or 4 times to derive any of theciphertexts is uniform, in this case ⅓.

By imposing a variable cardinality on the key combinations used toderive the ciphertexts, the complexity for a hacker to decrypt theciphertext/locator pairs may be further increased. n and/or u may beselected in accordance any of the above constraints.

Alternatively, if the multiple applications of the symmetric encryptionENC_(L) are non-commutative, the permutations of each key combinationmay be used to derive additional ciphertexts from the same plaintext,requiring even fewer keys to generate additional encrypted indexingterms.

The associated locators and file IDs may be similarly encrypted anddecrypted using the combination of keys used to derive the ciphertexts.It may be appreciated that in addition to reducing the number ofencryption keys, this technique may increase the complexity fordecrypting the locators and file IDs.

In some embodiments, the retrieved file portions 201(n) need to becombined in the correct sequence in order to recover file 200. Forexample, file 200 may be encrypted into a file 201, and partitioned inton encrypted file portions such that concatenating the n encrypted fileportions reconstructs the encrypted file 201, and which may then bedecrypted to recover file 200, where n may be selected in accordancewith constraints, such as portion size, a minimal or maximum number ofportions, and/or any of the constraints described above. It may beappreciated that storing the n portions divided thus may demand lessmemory than when storing file portions derived using secure secretsharing.

Since the n portions may be recombined n! ways, client 110 may map theportions according to the sequence of the encryption keys used to derivethe ciphertexts. The sequence of the encryption keys may thus be used todetermine the sequence for recombining the retrieved portions. Forexample, the first key derived from the key derivation function may beused to index and subsequently retrieve the first portion, the secondderived key may be used for the second portion, etc.

To use fewer keys than the number of portions, client 110 may map the nfile portions into the cells of an array, each dimension of whichcorresponds to a different factor of n. Each dimension of the array isassigned a set of encryption keys, such that each cell is uniquelyindexed by a unique combination of keys—one from each set. Thecombination of the encryption keys thus serves as coordinates forpositioning the file portions within the array.

For example, the sequence of keys derived from the key derivationfunction may be assigned sequentially to index the array, and bothclients 110 and 114 can assign the same sets of keys to index the cellsof the same array at both the uploading and retrieving ends. Eachportion is indexed and subsequently accessed using its uniquecombination of keys. On retrieving a portion, its position within thefile can be determined from the key combination used to derive itssearch query. The number of keys required to derive n ciphertexts isthus the sum of the factors of n. In addition to the encryption, thefactorization of n, corresponding to the dimensions of the array, may beused to encode and decode the data since different factorizations of nyield different combinations of the encryption keys, and differentschemes for the order of the encrypted file portions. This property maybe leveraged to securely share the file between clients 110 and 114, bysynchronizing the factorization of n for both the uploading and theretrieving of the file portions.

Optionally, the prime factors of n may be selected to define the array.Since the set of prime factor of n is unique, an array arranged inincreasing (or decreasing) order of the prime factors is unique as well.Thus, knowing n and a predefined order for the prime factors of n, eachof clients 110 and 114 can create the identical array without having toexchange information.

Thus, knowing the seed/passphrase for the key derivation function, theplain text tag, and n may be sufficient for recovering the file. Usingthis cipher information, client 114 can construct the unique arrayindexed with the keys, generate the search queries and correctly map theretrieved portions to recover the file.

An example of this is illustrated in Table 2 below, which shows a 2×5array that uses 7 keys to encrypt 10 unique index terms for each of 10file portions, and position the 10 file portions on retrieval. Clients110 and 114 each factor 10 into the prime factors, 2 and 5 and eachcreate a 2×5 array. Using the same seed, each of clients 110 and 114derive 7 encryption keys K1 . . . K7, and assign them sequentially inthe order that they were derived to index the array: K1 and K2 areassigned to the 2 rows, and K3 . . . K7 are assigned to the 5 columns.Thus, each cell in the array is indexed by a unique combination of a rowkey and a column key.

Client organizes the 10 encrypted file portions, indicated by thenumbers in each cell, column-wise, however this is not meant to belimiting, and any ordering scheme may be used:

TABLE 2 K3 K4 K5 K6 K7 K1 1: K1, K3 3: K1, K4 5: K1, K5 7: K1, K6  9:K1, K7 K2 2: K2, K3 4: K2, K4 6: K2, K5 8: K2, K6 10: K2, K7

Client 110 indexes the first portion with the ciphertext derived byencrypting the plaintext using keys K1 and K3; similarly, the secondportion is indexed with the ciphertext derived by encrypting theplaintext using keys K2 and K3, etc. Each ciphertext is derived byencrypting the plaintext ν times, where ν is the number of prime factorsof n.

On retrieving the portions, client 114 determines the position of theretrieved portions according to the combination of encryption keys usedto derive its search query. Thus, the portion retrieved in response toquerying for the ciphertext derived using the key combination K1 and K3is inserted into the top left cell, corresponding to the first portion,the portion retrieved in response to querying for the ciphertext derivedusing K2 and K3 is inserted into the bottom left cell, corresponding tothe second portion, etc. Once all the portions are retrieved, and thearray is filled, the portions may be recombined in their correct orderto recover the encrypted file, which may be decrypted to recover theoriginal plaintext file.

Although a two dimensional array is shown, this is for illustrativepurposes only, and the number of dimensions of the array, and thus thenumber of times the plaintext is encrypted to derive each ciphertext,corresponds to the number of prime factors of n.

Thus, the encryption keys play multiple roles: they encode a singleplaintext to derive multiple index entries for each of the fileportions; since the encoding comprises encryption, the index entries forany given file are hard to guess by a non-authorized entity;additionally the unique combination of keys used to derive the indexingterms may be used to position the retrieved encrypted file portion. Whenthe portions of multiple files are shared, multiple such arrays may beconstructed, one per file, using the fileID to sort the retrievedportions.

It may be appreciated that several implementations for using multipleencryption keys to encode the plaintext for indexing multiple portionshave been describe, however this is not meant to be limiting and othersuitable methods may be similarly used.

The associated locator strings and file IDs may be encrypted anddecrypted using the respective combination of keys. In addition torequiring fewer encryption keys, the multiple decryptions required tolocate each file portion may increase the complexity for anon-authorized user to recover file 200.

File portions 201(n) may be stored using any suitable method. Forexample, the portions may be stored as blocks of data within adocument-type database. Alternatively, file portions 201(n) may bestored as payloads within unlinked webpages that are inaccessible by webcrawlers or robots.

Reference is now made to FIGS. 2A-2B which show an exemplaryimplementation for storing multiple portions within multiple packetsconfigured as webpages, using the system of FIGS. 1A-1G. An image file200 may be tagged with the plain text ‘family photo 2016’. Client 110may encrypt the plain-text three times using algorithm ENC_(L) and keysK_(I)(1), K_(I)(2), and K_(I)(3), for n=3 to derive CipherTxT(i), i=1 .. . 3. Client 110 may use a (2,3) secret sharing technique ENC_(F) toencrypt file 200 into three portions, 201(1), 201(2), 201(3), of whichany 2 are required to recover file 200. Each of portions 201(1), 201(2),201(3) is associated with a different one of keys K_(I)(1), K_(I)(2),and K_(I)(3), and CipherTxT(i), i=1 . . . 3, respectively. Client 114may derive a file ID using a hashing function and encrypted the file IDthree times using keys K_(I)(1), K_(I)(2), and K_(I)(3) to produce threeencrypted file identifiers, ENC_(L)(K_(I)(i), fileID) for i∈{1 . . . 3}.

Referring to FIG. 2B, client 110 may create three webpages 222(1),222(2), and 222(3) conforming to a predefined protocol, allowing forsubsequent extraction of the file IDs and portions. For example, theencrypted file IDs may be stored as a script comment (‘//’) between twodelimiting <script></script> tags, or as an HTML comment, between “<!--”and “-->” tags, or any other suitable method that allows subsequentextraction of the encrypted file ID. The encrypted file ID for portion201(1), 201(2) 201(3), given by the expressions ENC_(L)(K_(I)(1), FILEID), ENC_(L)(K_(I)(2), FILE ID), ENC_(L)(K_(I)(3), FILE ID) areindicated for illustrative purposes in FIG. 2B as ‘*********’,‘#########’, and $$$$$$$$$$′, respectively. Similarly, the encryptedportions 201(1), 201(2) 201(3) may be stored within the body of thewebpages 222(1), 222(2), and 222(3), between <body> and </body> tags, oralternatively as comments. It may be appreciated that thisimplementation is illustrative only and is not meant to be limiting.

Client 110 may upload webpages 222(1), 222(2), and 222(3) over network102 to be hosted at three different storage hosting services 120(n) inassociation with three different URLs. Client 110 may encrypt the URLsfor each uploaded webpage 222(i) using the associated key K_(I)(i) andstore each encrypted URL at index 116 in association with its associatedCipherTxT(i), as described above, for subsequent retrieval by client114.

Client 114 obtains at least keys K_(I)(1), K_(I)(2). The user of device108 may be provided with the plaintext ‘family photo 2016’, and entersit into a GUI provided by client 114. Client 114 encrypts the plaintextto derive CipherTxT(1) and CipherTxT(2) using keys K_(I)(1), K_(I)(2)and ENC_(L) Client 114 queries index 116 using CipherTxT(1) andCipherTxT(2) to retrieve their associated encrypted URLs, which aredecrypted accordingly using keys K_(I)(1), K_(I)(2). Client 114 uses thedecrypted URLs to retrieve webpages 222(1), 222(2) optionally using twodifferent browser applications 130. Client 114 extracts file portions201(1) and 201(2) from webpages 222(1), 222(2) and uses them to recoverfile 200 using a (q,n) secret sharing recovery algorithm correspondingto the above (q,n) secret sharing encryption algorithm. If one ofportions 201(1) or 201(2) is corrupted or inaccessible, client 114 mayobtain K_(I)(3) and use it to retrieve portion 201(3).

Optionally, multiple encrypted portions belonging to different files maybe stored within the same webpage, allowing clients 110 and 114 to sharemultiple files using the same set of webpages.

Clients 114 and/or 110 may apply a normalization technique, as are knownin the art, to convert any entered plain-text tag to a normalized formprior to encryption by key K_(i). For example, the normalization mayneutralize capitalization of letters, or remove spaces, and thus, eitherof plain-text tags ‘Family’ and ‘family’ may be used to retrieve file200. Similarly, either of plain-text tags ‘photo 2016’ and ‘photo2016’may be used to recover file 200.

Optionally, each of multiple files may be tagged with more than oneplain-text tags, and thus indexed via multiple sets of ciphertexts,allowing a hierarchical file retrieval platform that allows retrievingdifferent files responsive to different plain text search queries usingthe same or a different set of keys K_(I)(n), allowing to structure andorganize secure access to data. For example, different levels ofauthorization may be granted by disclosing different plain-texts todifferent users all sharing the same set of keys K_(I)(n), or subsets ofkeys.

Additionally, the memory blocks for storing portions 201(n) may be setup in advance as placeholders and indexed, and the portions uploadedsubsequently

Reference is now made to FIG. 3 which illustrates a system to index themultiple cipher-text/locator pairs, in accordance with an embodiment. Toprevent mutually associating the multiple cipher-text/locator pairs bymonitoring index 116 and observing that the pairs were written and/orread to/from index 116 within a short time frame, or originated from thesame IP address, multiple copies of index 116 may be distributed atdifferent servers 116 a. Clients 110 and 114 may distribute theread/write requests across the different indexes 116, while insuring notto write and/or query from any one index 116 a sufficient number ofciphertext/locator pairs to recover file 200. To ensure all of indexes116 are updated, a synchronizer 126 may periodically update indexes 116,such as by logically OR'ing the inclusion of each cipher-text locatorpair across all the indexes 116 and adding any missing pairs asnecessary. The time interval for updating may be balanced against thefrequency of uploading files by client 110, i.e. a long time intervalmay allow a sufficient number of cipher-text/locator pairs to be writtento indices 116(n) such that correctly grouping any newly written entriesinto their corresponding file may be considerably complex, whereas ashort time interval may increase reliability for subsequent queries byclient 114. If clients 110 and 114 access index 116 anonymously, in ahigh traffic environment, the task of grouping the index queriesbelonging to any one file may be considerable.

In one embodiment, index 116 may be implemented via a publiclyaccessible search engine. Each ciphertext/encrypted locator pair may bestored within an indexable webpage. The ciphertext may be stored in anindexable portion of the webpage, such as within a pair of <header>,</header>; <title>, </title>, and/or <body>, </body> hypertext markuplanguage (HTML) tags, or any other suitable indication to the searchengine to index the webpage according to the ciphertext. The encryptedlocator may be stored in a manner that is not indexable, such as aJavascript or HTML comment, or padded to exceed the indexable tokenlength. Client 110 may submit such a webpage for eachciphertext/encrypted locator pair for indexing by the search engine viaa webmaster tool. Once indexed, client 114 may query the search engineusing the ciphertexts, retrieve the webpages and extract the encryptedlocators to retrieve the portions, as described above.

It may be noted that one or more of the steps described above as beingperformed by either of client 110 or 114 may be performed by a serverside application associated with any of clients 110 and 114, such as byimplementing one or more portions of clients 110 and/or 114 ascloud-computed applications.

Optionally, device 106 may be operative within the IoT and may beconfigured with one or both of clients 110 and 114. Client 110 may storeone or more parameters, such as one or more settings for operatingdevice 106. The plain-text tag may be any suitable plain-text forretrieving the parameters, such as the device ID of device 106, or apassword. Responsive to receiving the plaintext search query, client 114may generate the ciphertexts and retrieve the setting to operate device106, accordingly. The plaintext may be received automatically by client114 responsive to a signal such as a time, temperature, GPS, or othersignal or alert. Optionally, the received signal may be the plaintextsearch query.

Similarly, device 106 may program device 108 by tagging parameters foroperating device 108 using the device ID of device 108 and/or device 106and storing as above. Responsive to a signal, device 108 may initiateclient 114 with the device ID as the plain-text, retrieve theparameters, and operate according to the received parameters. Forexample, device 106 may be a mobile phone, and device 108 may be an airconditioner. Device 106 may set the thermostat, timer and fan setting ofdevice 108 as described above and store them in encrypted format on thecloud. Responsive to a signal, such as time, temperature, or GlobalPositioning System (GPS) signal indicating the proximity of device 106,device 108 may retrieve the parameters using its ID as the plaintext,decrypt the parameters and use them to operate accordingly. Theseexamples are meant to be illustrative only, and other suitable methodsfor operating devices 106 and 108 securely over network 102 using themethods described herein may be used.

Reference is now made to FIG. 4A which shows a flowchart of a method forsharing an encrypted file, in accordance with an embodiment. A first andsecond client may each be instantiated (Step 400). Optionally, the firstclient may be instantiated on one device and the second client may beinstantiated on a different device. Any of the devices may be operablewith the IoT. The first client receives a file and a plaintext tag (Step402) and provides the second client with secure access to the file usingthe plaintext tag according to the method of the flowchart of FIG. 4B(Step 404). The second client receives a search query comprising theplaintext tag (Step 406), and uses the search query to securely retrievethe file using the method of the flowchart of FIG. 4C (Step 408).

Reference is now made to FIG. 4B which shows a flowchart of a method forthe first client to provide the second client with secure access to thefile. The received file may be encrypted into multiple encryptedportions (Step 410). Each encrypted portion may be separately stored ata storage device (Step 412). Multiple different ciphertexts may bederived by encrypting the plaintext tag using multiple differentencryption keys (Step 414). Each stored encrypted portion may beseparately indexed at an index with a different one of the ciphertexts(Step 416).

Reference is now made to FIG. 4C which shows a flowchart of a method forthe second client to securely retrieve the file. Multiple differentencrypted search queries may be derived by encrypting the search queryusing the multiple different encryption keys (Step 418). The multipleencrypted search queries may be separately submitted to an index (Step420). The multiple encrypted portions may be separately retrieved fromthe storage device (Step 422), and the file may be recovered from themultiple encrypted portions (Step 424).

The ciphertexts may be derived from the plaintext by applying asymmetric encryption algorithm to the plaintext using a uniquecombination of the encryption keys, where a single key is understood tobe a combination of one key. Each stored encrypted portion may beseparately indexed by storing the ciphertext indexing the portiontogether with an encrypted storage location string of the storedencrypted portion, where the encrypted storage location string wasencrypted using the symmetric encryption algorithm and the uniquecombination of encryption keys used to encrypt the indexing ciphertext.

Similarly, for retrieving the multiple encrypted portions, for eachencrypted search query, the encrypted locator string stored with theciphertext corresponding to the search query may be retrieved, theencrypted locator string may be decrypted using the unique combinationof the encryption keys used to derive the encrypted search query, andthe encrypted file portion may be retrieved from the storage deviceusing the decrypted locator string.

The file may be encrypted using any known technique, such as by applyinga symmetric scheme, or an asymmetric scheme that uses a public/privatekey pair, or by using a different type of file encryption key.Alternatively, the file may be encrypted by applying a (q,n) thresholdsecret-sharing scheme wherein n is the number of stored encryptedportions and where q are the number of portions required to recover thefile. In this case the second client may retrieve the file by submittingq or more search queries to retrieve q or more of the encryptedportions.

Optionally the first client may provide the second client with cipherinformation to allow the second client generate the ciphertexts from theplaintext tag. For example, the cipher information may include thenumber of portions required to retrieve the file. Additionally oralternatively, the first client may provide the second client with themultiple different encryption keys for encrypting the plaintext tag. Thefirst client may provide the second client with the file encryption keyfor decrypting the file. Alternatively both the first and second clientsmay obtain any of the encryption keys from an authorized third party.Optionally, the first client may be configured with a key derivationfunction, and may derive any of the encryption keys for encrypting thefile and/or the indexing terms using the function and a seed. Forexample, the seed may be the plaintext tag. Optionally, the secondclient is also configured with the key derivation function and mayderive any of the encryption keys using the function and the seed.Optionally, the second client may derive the key required to decrypt theretrieved file portions from the key derivation function. Optionally,the first client provides the second client with the seed.

Optionally, the number of multiple encryption keys for deriving theciphertexts is fewer than the number of stored encrypted portions. Forexample, the number of encrypted file portions n, and the number ofindex encryption keys u may be determined according to the relationshipn≤2^(u)−1, corresponding to all the possible non-null combinations forthe u keys. In this case, encrypting the plaintext tag into the multipledifferent ciphertexts comprises, for each ciphertext, encrypting theplaintext ν times using a different one of the 2_(u)−1 non-nullcombinations of the u encryption keys, where ν is the cardinality of thecombination. Optionally, any of n and u may be selected in accordancewith a constraint imposed on the cardinalities of the combinations, suchas imposing a uniform distribution on the cardinalities, or a minimum,maximum, or constant cardinality.

Optionally, the combinations of the encryption keys used to derive theciphertexts may be used to sort the retrieved file portions. Eachciphertext may be derived by encrypting the plaintext tag using adifferent combination of the encryption keys, where the encrypted fileportions are sorted according to the combinations of the keys. Forexample, the file portions may be sorted into an array whose cells areindexed by the encryption keys, and thus each file portion is assigned aunique combination of the encryption keys. The second client may encryptthe search query in a similar manner: each ciphertext may be derived byencrypting the search query with one of the combinations of theencryption keys. The second client may create the same array as thefirst client and sort the retrieved portions according to the multiplecombinations.

It may be appreciated that when the dimensions of the array correspondto the prime factors of the number of file portions, n, and organizedaccording to an order of the prime factors, such as smallest to largest,or vice versa, the array is unique, and thus, knowing n, both the firstand second client can create the identical array, allowing the secondclient to sort the portions according to the key combination used toretrieve them and correctly recombine them to decrypt the file. Thus,the multiple combinations of the encryption keys may be determinedaccording to the prime factors of n, and the number of keys ucorresponds to the sum of the prime factors.

Optionally, the first client may separately index by indexing atmultiple different indexes, and the second client may separately submitthe multiple separate search queries by distributing the submissionsover the multiple different indexes, such that no single index isqueried for all the portions required to recover the file.

The first client may provide the second client with secure access tomultiple files by querying for the same plaintext tag, and the secondclient may securely retrieve the multiple files by querying with asearch query that matches the plaintext tag. The portions of themultiple files may include information that allows them to be sorted bythe second client according to their respective files. The informationmay be encrypted such that the stored portions of the multiple files donot include any common terms that may be used to associate and/oridentify those portions with each other.

Reference is now made to FIG. 5, which shows an exemplary system 500according to an embodiment. System 500 may include a computing device510. Computing device 510 may include a hardware processor 520, astorage device 530 and an optional input/output (“I/O”) device 540.Hardware processor 520 may include one or more hardware processors,storage device 530 may include one or more storage devices and I/Odevice 540 may include one or more I/O devices. Hardware processor 520may be configured to execute the method of FIG. 1. I/O device 540 may beconfigured to allow a user to interact with system 500. For example, I/Odevice 540 may include a display, a loudspeaker and/or a printer whichmay for example output a list of evidence for a user according to themethod of FIG. 1. Dedicated software, implementing the method of FIG. 1may be stored on storage device 530 and executed by hardware processor520.

In some embodiments, computing device 510 may include an I/O device 540such as a terminal, a display, a keyboard, a mouse, a touch screen, arecorder, a loudspeaker, a printer, an input device and/or the like tointeract with system 500, to invoke system 500 and to receive results.It will however be appreciated that system 500 may operate without humanoperation and without I/O device 540. In some exemplary embodiments ofthe disclosed subject matter, storage device 530 may include or beloaded with a user interface. The user interface may be utilized toreceive input, such as a context and optionally a content resourceand/or provide output, such as a list of evidence, to and from system500, including receiving specific user commands or parameters related tosystem 500, providing output, or the like.

The present invention may be a system, a method, and/or a computerprogram product. The computer program product may include a computerreadable storage medium (or media) having computer readable programinstructions thereon for causing a processor to carry out aspects of thepresent invention.

The computer readable storage medium can be a tangible device that canretain and store instructions for use by an instruction executiondevice. The computer readable storage medium may be, for example, but isnot limited to, an electronic storage device, a magnetic storage device,an optical storage device, an electromagnetic storage device, asemiconductor storage device, or any suitable combination of theforegoing. A non-exhaustive list of more specific examples of thecomputer readable storage medium includes the following: a portablecomputer diskette, a hard disk, a random access memory (RAM), aread-only memory (ROM), an erasable programmable read-only memory (EPROMor Flash memory), a static random access memory (SRAM), a portablecompact disc read-only memory (CD-ROM), a digital versatile disk (DVD),a memory stick, a floppy disk, a mechanically encoded device such aspunch-cards or raised structures in a groove having instructionsrecorded thereon, and any suitable combination of the foregoing. Acomputer readable storage medium, as used herein, is not to be construedas being transitory signals per se, such as radio waves or other freelypropagating electromagnetic waves, electromagnetic waves propagatingthrough a waveguide or other transmission media (e.g., light pulsespassing through a fiber-optic cable), or electrical signals transmittedthrough a wire.

Computer readable program instructions described herein can bedownloaded to respective computing/processing devices from a computerreadable storage medium or to an external computer or external storagedevice via a network, for example, the Internet, a local area network, awide area network and/or a wireless network. The network may comprisecopper transmission cables, optical transmission fibers, wirelesstransmission, routers, firewalls, switches, gateway computers and/oredge servers. A network adapter card or network interface in eachcomputing/processing device receives computer readable programinstructions from the network and forwards the computer readable programinstructions for storage in a computer readable storage medium withinthe respective computing/processing device.

Computer readable program instructions for carrying out operations ofthe present invention may be assembler instructions,instruction-set-architecture (ISA) instructions, machine instructions,machine dependent instructions, microcode, firmware instructions,state-setting data, or either source code or object code written in anycombination of one or more programming languages, including an objectoriented programming language such as Smalltalk, C++ or the like, andconventional procedural programming languages, such as the “C”programming language or similar programming languages. The computerreadable program instructions may execute entirely on the user'scomputer, partly on the user's computer, as a stand-alone softwarepackage, partly on the user's computer and partly on a remote computeror entirely on the remote computer or server. In the latter scenario,the remote computer may be connected to the user's computer through anytype of network, including a local area network (LAN) or a wide areanetwork (WAN), or the connection may be made to an external computer(for example, through the Internet using an Internet Service Provider).In some embodiments, electronic circuitry including, for example,programmable logic circuitry, field-programmable gate arrays (FPGA), orprogrammable logic arrays (PLA) may execute the computer readableprogram instructions by utilizing state information of the computerreadable program instructions to personalize the electronic circuitry,in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems), and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer readable program instructions.

These computer readable program instructions may be provided to aprocessor of a general purpose computer, special purpose computer, orother programmable data processing apparatus to produce a machine, suchthat the instructions, which execute via the processor of the computeror other programmable data processing apparatus, create means forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks. These computer readable program instructionsmay also be stored in a computer readable storage medium that can directa computer, a programmable data processing apparatus, and/or otherdevices to function in a particular manner, such that the computerreadable storage medium having instructions stored therein comprises anarticle of manufacture including instructions which implement aspects ofthe function/act specified in the flowchart and/or block diagram blockor blocks.

The computer readable program instructions may also be loaded onto acomputer, other programmable data processing apparatus, or other deviceto cause a series of operational steps to be performed on the computer,other programmable apparatus or other device to produce a computerimplemented process, such that the instructions which execute on thecomputer, other programmable apparatus, or other device implement thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof instructions, which comprises one or more executable instructions forimplementing the specified logical function(s). In some alternativeimplementations, the functions noted in the block may occur out of theorder noted in the figures. For example, two blocks shown in successionmay, in fact, be executed substantially concurrently, or the blocks maysometimes be executed in the reverse order, depending upon thefunctionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts or carry out combinations of special purpose hardwareand computer instructions.

The descriptions of the various embodiments of the present inventionhave been presented for purposes of illustration, but are not intendedto be exhaustive or limited to the embodiments disclosed. Manymodifications and variations will be apparent to those of ordinary skillin the art without departing from the scope and spirit of the describedembodiments. The terminology used herein was chosen to best explain theprinciples of the embodiments, the practical application or technicalimprovement over technologies found in the marketplace, or to enableothers of ordinary skill in the art to understand the embodimentsdisclosed herein.

What is claimed is:
 1. A system for providing query-based access to afile, comprising: at least one storage device; at least one hardwareprocessor; and at least one non-transitory memory device having embodiedthereon program code executable by said at least one hardware processorto: provide query-based access to a file by: creating multiple encryptedportions of said file, deriving multiple different ciphertexts from aplaintext term associated with said file using multiple different indexkeys, associating each one of said multiple encrypted portions with adifferent one of said multiple different ciphertexts, separately storingeach one of said multiple encrypted portions at said at least onestorage device, and indexing each one of said multiple stored encryptedportions with said associated one of said multiple differentciphertexts, and retrieve said file by: deriving at least some of saidmultiple different ciphertexts from said plaintext term using at leastsome of said multiple different index keys, submitting said at leastsome of said multiple different ciphertexts as multiple different searchqueries, for each submitted one of said multiple different ciphertexts,retrieving from said at least one storage device, said associated one ofsaid multiple encrypted portions, and recovering said file from saidmultiple retrieved encrypted portions.
 2. The system of claim 1, whereinsaid program code is further executable to derive said multipledifferent ciphertexts by hashing said plaintext term using said multipledifferent index keys.
 3. The system of claim 1, wherein said programcode is executable to associate each one of said multiple encryptedportions with said different one of said multiple different ciphertextsby creating a map between said multiple different index keys and saidmultiple encrypted portions.
 4. The system of claim 3, wherein saidprogram code is further executable to: derive said multiple differentindex keys from a seed according to a key sequence and create said mapby indexing said map with said multiple different index keys accordingto said key sequence.
 5. The system of claim 7, further comprising afirst client computer comprising a first one of said at least onehardware processor and a first one of said at least one non-transitorymemory device, and a second client computer comprising a second one ofsaid at last one hardware processor and a second one of said at leastone non-transitory memory device, wherein said program code is furtherexecutable by said first client computer to provide said query-basedaccess to said file to said second client computer by providing saidseed to said second client computer.
 6. The system of claim 3, whereinsaid multiple different index keys are fewer than said multipleencrypted portions, wherein said map maps a unique combination of saidmultiple different index keys to each one of said multiple encryptedportions.
 7. The system of claim 3, wherein said program code is furtherexecutable to: index each one of said multiple stored encrypted portionsby encrypting a storage location string of each one of said multiplestored encrypted portions according to said map, and retrieve eachassociated one of said multiple encrypted portions from said at leastone storage device by decrypting said storage location string of saidassociated one of said multiple encrypted portions according to saidmap.
 8. The system of claim 3, wherein said program code is furtherexecutable to create said multiple encrypted portions of said file by:encrypting said file with a file encryption key, and partitioning saidencrypted file into said multiple encrypted portions, wherein combiningsaid multiple encrypted portions according to a correct sequence definedby said map reconstructs said encrypted file, and wherein said programcode is further executable to recover said file from said multipleretrieved encrypted portions by: combining said multiple retrievedencrypted portions according to said correct sequence defined by saidmap to reconstruct said encrypted file, and decrypting said encryptedfile with said file encryption key.
 9. A method for providingquery-based access a file, comprising: creating multiple encryptedportions of a file; deriving multiple different ciphertexts from aplaintext term associated with said file using multiple different indexkeys; associating each one of said multiple encrypted portions with adifferent one of said multiple different ciphertexts; separately storingeach one of said multiple encrypted portions; and indexing each one ofsaid multiple stored encrypted portions with said associated one of saidmultiple different ciphertexts.
 10. The method of claim 9, whereinderiving said multiple different ciphertexts comprises hashing saidplaintext term using said multiple different index keys.
 11. The methodof claim 9, wherein associating each one of said multiple encryptedportions with said different one of said multiple different ciphertextscomprises creating a map between said multiple different index keys andsaid multiple encrypted portions.
 12. The method of claim 11, furthercomprising deriving said multiple different index keys from a seedaccording to a key sequence, wherein creating said map comprisesindexing said map with said multiple different index keys according tosaid key sequence.
 13. The method of claim 11, wherein indexing each oneof said multiple stored encrypted portions comprises encrypting astorage location string of each one of said multiple stored encryptedportions according to said map.
 14. The method of claim 11, whereincreating said multiple encrypted portions of said file comprises:encrypting said file with a file encryption key, and partitioning saidencrypted file into said multiple encrypted portions, wherein combiningsaid multiple encrypted portions according to a correct sequence definedby said map reconstructs said encrypted file.
 15. A method forretrieving a file, comprising: deriving multiple different ciphertextsfrom a plaintext term associated with a file using multiple differentindex keys, wherein each one of multiple encrypted portions of said fileis associated with a different one of said multiple differentciphertexts; submitting said multiple different ciphertexts as multipledifferent search queries; for each submitted one of said multipledifferent ciphertexts, retrieving said associated one of said multipleencrypted portions; and recovering said file from said multipleretrieved encrypted portions.
 16. The method of claim 15, whereinderiving said multiple different ciphertexts comprises hashing saidplaintext term using said multiple different index keys.
 17. The methodof claim 15, further comprising associating each one of said multipleencrypted portions with said different one of said multiple differentciphertexts by creating a map between said multiple different index keysand said multiple encrypted portions.
 18. The method of claim 17,further comprising obtaining a seed and deriving said multiple differentindex keys from said seed according to a key sequence, wherein creatingsaid map comprises indexing said map with said multiple different indexkeys according to said key sequence.
 19. The method of claim 17, whereinretrieving each associated one of said multiple encrypted portionsfurther comprises decrypting a storage location string of saidassociated one of said multiple encrypted portions according to saidmap.
 20. The method of claim 17, wherein recovering said file from saidmultiple retrieved encrypted portions comprises: combining said multipleretrieved encrypted portions according to a correct sequence defined bysaid map to reconstruct an encrypted file, and decrypting said encryptedfile with a file encryption key.